Distributed approximate KNN Graph construction for high dimensional Data
نویسندگان
چکیده
La construction des graphes de plus proches voisins est un problème crucial pour nombre d’applications, notamment celles impliquant des algorithmes d’apprentissage et de fouille de données. Bien qu’il existe certain travaux visant à résoudre le problème dans des environnements centralisés, ils en restent néanmoins limités en raison du volume croissant des données ainsi que leur dimensionalité. Dans cet article, nous proposons une méthode basée sur des fonctions de hachage pour la construction des graphes de plus proches voisins. La méthode proposée est distribuable et scalable, aussi bien en volume qu’en dimensionalité. Par ailleurs, l’utilisation d’une nouvelle famille de fonctions de hachage, RMMH, garantit l’équilibe des charges en environnements parallèles et distribués.
منابع مشابه
Distributed computation of the knn graph for large high-dimensional point sets
High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for...
متن کاملFast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact kNN graph takes Θ(dn2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dnt) time for high dimensional data (large d). The exponent t ∈ (1,2) is an increasing function of an intern...
متن کاملEFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph
Approximate nearest neighbor (ANN) search is a fundamental problem in many areas of data mining, machine learning and computer vision. The performance of traditional hierarchical structure (tree) based methods decreases as the dimensionality of data grows, while hashing based methods usually lack efficiency in practice. Recently, the graph based methods have drawn considerable attention. The ma...
متن کاملFast kNN Graph Construction with Locality Sensitive Hashing
The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graphbased learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient a...
متن کاملExploring Bit-Difference for Approximate KNN Search in High-dimensional Databases
In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propo...
متن کامل